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SPECIFICATION 



TITLE 



METHOD FOR CHARACTER SEPARATION IN TEXT RECOGNITION TASKS 



Field of the Invention 

The invention relates to a method for character separation in text recognition 

tasks. 

Description of the Related Art 

In the automatic recognition of texts, that is to say when converting the graphic 
information of a document into text characters which can be further processed by 
means of electronic text processing programs, an essential precondition for a 
successful recognition operation is the precise determination of the position and the 
size of the individual characters. In the case of originals with poor lettering or fonts with 
a very narrow character space, this determination is problematic, inter alia, in that the 
characters are interconnected and "grow together", and can therefore no longer be 
separated using conventional methods such as simple contour tracking. 



The invention is therefore based on the object of specifying an improved method 
for separating interconnected characters. 

This is performed according to the invention with the aid of a method in 
which possible points of intersection are determined in relation to the extraction objects 



BACKGROUND OF THE INVENTION 



SUMMARY OF THE INVENTION 




under examination by means of white space analysis and angle analysis, in which 
plausible separating lines are determined from the points of intersection and 
corresponding mating points, and in which objects separated in such a way are 
subjected to a classification process and the final separation is performed on the basis 
5 of the results. 

A refinement of the method in which when there are more than three possible 
points of intersection, a first section is performed through the point of intersection 
selected fourth from the left-hand start of the character is advantageous. The reason 



for this is because no conventional text character of the Lafin script has more than three 



It is also favorable when after a first section with a first possible point of 
intersection and a subsequent unsuccessful attempt at classification, the left-hand 
neighboring point of intersecfion situated closest to the first possible point of intersecfion 
is provided as basis for a further attempt at separation. 



W 



white spaces. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



Figure 1 shows an illustrafion relating to the white space analysis of an image. 



Figure 2 shows an illustrafion relafing to the actual character separafion. 



DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 
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The sequence of the method according to the invention is as follows: 
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The method is started in the recognition operation after the determination of the 
position of the line. A white space analysis is already carried out when determining the 
circumference of a character or a plurality of connected characters by contour tracking. 
An angle analysis is performed after the complete contour is available. 
5 White space analysis and angle analysis are used to determine possible points 

of intersection, which supply possible separating lines in conjunction with mating points. 

The points of intersection are examined with regard to their plausibility/possibility. 
Which character sequences contain the present combination of white spaces is 
^ determined in the process. Thus, for example, the following white spaces are contained 
1(Q in the letter sequence WV: TOP-BOTTOM-TOP-BOTTOM-TOP. Here, TOP (BOTTOM) 
1=^ characterizes the white space which is open at the top (bottom). The knowledge of the 
H= letters is now used to perform the first separation through the point of intersection of the 
^ fourth white space. 

\^ It is determined thereupon to what extent the separation of the object along the 

IfC separating lines touching at plausible points of intersection leads to plausible 
classification results. In other words, the separated characters or parts of characters are 
subjected to a recognition operation, for example by means of neural network and the 
separation is accepted if this operation leads to a satisfactory result - a character 
recognized with high reliability. Otherwise, the separation is repeated along other 
20 separation lines until there is a satisfactory result. 

Neural networks are mathematical models which simulate the structure of the 
human brain. They comprise neurons, which are essentially summing elements with 




weighted inputs and a nonlinear annpllfier component which are combined to form a 
parallel network having typically two levels. A detailed description of the feed fonward 
neural networks used in the exemplary embodiment is to be found, for example, in 
"Layered Neural Nets for Pattern Recognition", B. Widrow, R.G. Winter, R.A. Baxter; 
5 IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 36, No. 7, July 
88. 

Pattern recognition by means of a neural network is performed using the method 
described in "A rotation, scaling, and translation invariant pattern classification system", 
5 C. Yuceer, K. Oflazer; Pattern Recognition, Vol. 26, No. 5, pp. 687-710, 1993. 
l(g The white space analysis is described in more detail with the aid of figure 1 . The 

t figure shows the two interconnected letters r and f, which have a white space W. Here, 
^ white space W means a white interspace bounded on three sides which has a certain 
F; depth and whose open side is directed upward or downward. This white space W is 
5 determined in the tracking of the contour of the character (which has grown together) 
l5 when the contour line C transgresses two prescribed threshold values SW in both 
directions. If, as in the example, there is a white space W which is open downward, the 
highest point of the contour line C is defined as a possible point of intersection S, this 
being the lowest point in the case of a white space which is open upward. 

The sequence of the angle analysis performed thereupon is as follows: 
20 two vectors for which it holds that: 

A = C [i]C[i - 5] and B = C[i]C[i + 5] 
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are determined from in each case three points on the contour line C[i]. 

The angle between the two vectors is calculated. The angle is entered into a list 
if it is right-to-left with an absolute value of less than 80° and a vertex (C[i]) either 
upward or downward. 

5 If this condition is fulfilled for a plurality of juxtaposed vector pairs, only the angle 

with the smallest absolute value is tracked further. 

The angles entered in the list are now examined as to whether an angle of 
opposite orientation to the vertex is present on the opposite side of the contour line. If 
§ this is the case, the angle pair formed thereupon is stored as the position of a possible 
1® point of intersection. 

The sequence in the determination of the angle between two vectors which are 
^ defined by the three points from the contour line (C^xl/yl, C^\x2ly2, C„:mx/my) is 
H described below. The x and y components of the two vectors are determined therefrom. 

Ax= xl-mx; Ay = y1-my; Bx=x2-mx; By=y2-mx. 
1^ The angle between the vectors A and B is calculated as follows: firstly, the angle 

of A to the X-axis is determined, and then the angle B to the x-axis. 
Angle = arccos 
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Angle (in degrees) = Angle (InRad)* 180 

PI 



Angle = 360-angle B+angle A (if the angle is greater than 360^ the angle is 
corrected by 360*"). 

The determination of the direction of the angle vertex is based on the 
consideration that in the case of a downward directed vertex the y-coordinates of the 
points and Cg are snnaller than the y-coordinate of C^^. 

In the case of an upwardly directed vertex, the y-coordinates of the points and 
Cg must be greater than the y-coordinate of C^-^. 

The characteristics of the printed text and the influence of the limited image 
resolution necessarily mean that, as a function of the space under consideration, in the 
region of a kink in the contour of a character that the angles, determined in the way 
described, between 2 vectors firstly become increasingly smaller and thereafter 
continuously increase again. Consequently, only the respectively minimum angle of 
such a range is used for the further evaluation. 

In order to fix a possible separating line, it is now necessary to determine for 
each possible point of intersection C(Nr) a corresponding mating point on the opposite 
branch of the contour line C(i);i=(0,., contour Nr). 

For this purpose, a straight line is laid through two points C(Nr-1) and C(Nr+1) 
adjacent to the possible point of intersection C(Nr) on the contour line, and the normal 
to this straight line is determined. The points adjacent to the point of intersection of this 



normal with the opposite branch of the contour line are exannined with regard to their 

spacing value from the possible point of intersection and the normal, and the contour 

point with the minimum spacing value is defined as mating point C(g), and thus as 

second point of the possible separating line. The mathematical definition of this 

operation is as follows: 

nx=C(Nr'i'1)X'C(Nr'1)x 

ny^C(Nr'^1)y'C(NM)y 



^ 4{C{Nr)x-ai)xf HCmy-C(i)yf 



Spacing 
spacing relative to g^ 

= abs 



^ nx^{Cii)x-C{Nr)x)+ny''{C{i)y-CQ^ 



spacing value = spacing + spacing relative to g^; 
C(g) = C(i) I spacing value (C(g), C(Nr)) = min 

The actual separation is explained with the aid of figure 2, the basis of the 
separation is the contour line of the extracted character. In a first step, a separating line 
buffer is initialized with 0, and this corresponds to a perpendicular line at the left-hand 
edge, and thereafter the point on the contour line 1 between 0 and the point of 
intersection (the X-value maximum) on which the separation is based which is situated 




furthest to the right is determined. The point on the branch (the x-value maximum) of 
the contour line from the mating point up to the end of the contour 2 and the separating 
line 3 which is situated furthest to the right is also determined. 

The maximum x-values collected therefore constitute the extreme right-hand 
5 edge of the character used for the classification. 

Although other modifications and changes may be suggested by those skilled in 
the art, it is the intention of the inventors to embody within the patent warranted hereon 
all changes and modifications as reasonably and properly come within the scope of 
^ their contribution to the art. 
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Method for character separation in text recognition 
tasks 

The invention relates to a method for character 
5 separation in text recognition tasks. 

In the automatic recognition of texts, that is 
to say when converting the graphic information of a 
document into text characters which can be further 
processed by means of electronic text processing 

10 programs, an essential precondition for a successful 
recognition operation is the precise determination of 
the position and the size of the individual characters. 
In the case of originals with poor lettering or fonts 
with a very narrow character space, this determination 

15 is problematic, inter alia, in that the characters are 
interconnected and ''grow together", and can therefore 
no longer be separated using conventional methods such 
as simple contour tracking. 

The invention is therefore based on the object 

20 of specifying an improved method for separating 
interconnected characters . 

This is performed according to the invention 
with the aid of a method of the type mentioned at the 
beginning, in which possible points of intersection are 

25 determined in relation to the extraction objects under 
examination by means of white space analysis and angle 
analysis, in which plausible separating lines are 
determined from the points of intersection and 
corresponding mating points, and in which objects 

30 separated in such a way are subjected to a 
classification process and the final separation is 
performed on the basis of the results . 

A refinement of the method in which when there 
are more than three possible points of intersection, a 

35 first section is performed through the point of 
intersection selected fourth from the left-hand start 
of the 
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character is advantageous. The reason for this is 
because no conventional text character of the Latin 
script has more than three white spaces. 

It is also favorable when after a first section 
5 with a first possible point of intersection and a 
subsequent unsuccessful attempt at classification, the 
left-hand neighboring point of intersection situated 
closest to the first possible point of intersection is 
provided as basis for a further attempt at separation. 

10 The invention is explained in more detail with 

the aid of figures in which, by way of example: 

Figure 1 shows an illustration relating to the 
white space analysis of an image, and 

Figure 2 shows an illustration relating to the 

15 actual character separation. 

The sequence of the method according to the 
invention is as follows: 

The method is started in the recognition 
operation after the determination of the position of 

20 the line. A white space analysis is already carried out 
when determining the circumference of a character or a 
plurality of connected characters by contour tracking. 
An angle analysis is performed after the complete 
contour is available. 

25 White space analysis and angle analysis are 

used to determine possible points of intersection, 
which supply possible separating lines in conjunction 
with mating points. 

The points of intersection are examined with 

30 regard to their plausibility. Which character sequences 
contain the present combination of white spaces is 
determined in the process. Thus, for example, the 
following white spaces are contained in the letter 
sequence WV: TOP-BOTTOM-TOP-BOTTOM-TOP. Here, TOP 

35 (BOTTOM) characterizes the white space which is open at 
the top (bottom) . The knowledge of the letters is now 
used to perform the first separation through the point 
of intersection of the fourth white space. 
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It is determined thereupon to what extent the 
separation of the object along the separating lines 
touching at plausible points of intersection leads to 
plausible classification results. In other words, the 
5 separated characters or parts of characters are 
subjected to a recognition operation, for example by 
means of neural network and the separation is accepted 
if this operation leads to a satisfactory result - a 
character recognized with high reliability. Otherwise, 

10 the separation is repeated along other separation lines 
until there is a satisfactory result. 

Neural networks are mathematical models which 
simulate the structure of the human brain. They 
comprise neurons, which are essentially summing 

15 elements with weighted inputs and a nonlinear amplifier 
component which are combined to form a parallel network 
having typically two levels. A detailed description of 
the feed forward neural networks used in the exemplary 
embodiment is to be found, for example, in ^'Layered 

20 Neural Nets for Pattern Recognition", B. Widrow, R.G. 
Winter, R.A. Baxter; IEEE Transactions on Acoustics, 
Speech and Signal Processing, Vol. 36, No. 7, July 88. 

Pattern recognition by means of a neural 
network is performed using the method described in "A 

25 rotation, scaling, and translation invariant pattern 
classification system", C. Yuceer, K. Oflazer; Pattern 
Recognition, Vol. 26, No. 5, pp. 687-710, 1993. 

The white space analysis is described in more 
detail with the aid of figure 1. The figure shows the 

30 two interconnected letters r and f, which have a white 
space W. Here, white space W means a white interspace 
bounded on three sides which has a certain 
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depth and whose open side is directed upward or 
downward. This white space W is determined in the 
tracking of the contour of the character (which has 
grown together) when the contour line C transgresses 
5 two prescribed threshold values SW in both directions. 
If, as in the example, there is a white space W which 
is open downward, the highest point of the contour line 
C is defined as a possible point of intersection S, 
this being the lowest point in the case of a white 
10 space which is open upward. 

The sequence of the angle analysis performed 
thereupon is as follows: 

two vectors for which it holds that: 
]i = C[i]C[i-5]and 5 = C[i]C[i + 5 
15 are determined from in each case three points on the 
ffi contour line C[i]. 

^ The angle between the two vectors is 

calculated. The angle is entered into a list if it is 

=P right-to-left with an absolute value of less than 80° 

20 and a vertex (C[i]) either upward or downward. 

p If this condition is fulfilled for a plurality 

of juxtaposed vector pairs, only the angle with the 

!jj smallest absolute value is tracked further. 

ri The anqles entered in the list are now examined 

H= 25 as to whether an angle of opposite orientation to the 

vertex is present on the opposite side of the contour 
line. If this is the case, the angle pair formed 
thereupon is stored as the position of a possible point 
of intersection. 



30 
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The sequence in the determination of the angle 
between two vectors which are defined by the three 
points from the contour line (Ci:xl/yl, C6:x2/y2, 
Cii:mx/my) is described below. The x and y components of 
the two vectors are determined therefrom . 
Ax = xl-mx; Ay = yl-my; Bx=x2-mx; By^2~mx. 

The angle between the vectors A and B is 
calculated as follows: firstly, the angle of A to the 
X-axis is determined, and then the angle B to the x- 
axis . 

Angle = arccos . _ 

Angle (In Rad) * 180 

Angle (in degrees) = 

Pi 

Angle = 360-angle B+angle A (if the angle is 
greater than 360°, the angle is corrected by 360*') . 

The determination of the direction of the angle 
vertex is based on the consideration that in the case 
of a downward directed vertex the y-coordinates of the 
points Ci and Ce are smaller than the y-coordinate of 
Cii . 

In the case of an upwardly directed vertex, the 
y-coordinates of the points Ci and Ce must be greater 
than the y-coordinate of Cn. 

The characteristics of the printed text and the 
influence of the limited image resolution necessarily 
mean that, as a function of the space under 
consideration, in the region of a kink in the contour 
of a character that the angles, determined in the way 
described, between 2 vectors 
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firstly become increasingly smaller and thereafter 
continuously increase again. Consequently, only the 
respectively minimum angle of such a range is used for 
the further evaluation. 



is now necessary to determine for each possible point 
of intersection C(Nr) a corresponding mating point on 



through two points C(Nr-l) and C(Nr+l) adjacent to the 
possible point of intersection C(Nr) on the contour 
line, and the normal to this straight line is 
determined. The points adjacent to the point of 
intersection of this normal with the opposite branch of 
the contour line are examined with regard to their 
spacing value from the possible point of intersection 
and the normal, and the contour point with the minimum 
spacing value is defined as mating point C(g), and thus 
as second point of the possible separating line. The 
mathematical definition of this operation is as 
follows : 

nx=C (Nr-hl) x-C (Nr-l)x 
ny=C (Nxr-i-l)y-C (Nr-l)y 



In order to fix a possible separating line, it 



the opposite branch of the contour line 
C(i);i=(0,., contour Nr) . 

For this purpose, a straight line is laid 



Spacing = 4iC{Nr)x - C{i)xf + {C{Nr)y - C{i)yf 




J 



spacing value = spacing + spacing relative to g^; 



C{g) = C{i)| spacing value (C(g), C(Nr)) = min 
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The actual separation is explained with the aid 
of figure 2, the basis of the separation is the contour 
line of the extracted character. In a first step, a 
separating line buffer is initialized with 0, and this 
5 corresponds to a perpendicular line at the left-hand 
edge, and thereafter the point on the contour line 1 
between 0 and the point of intersection (the X-value 
maximum) on which the separation is based which is 
situated furthest to the right is determined. The point 
10 on the branch (the x-value maximum) of the contour line 
from the mating point up to the end of the contour 2 
and the separating line 3 which is situated furthest to 
the right is also determined. 
^ The maximum x-values collected therefore 

15 constitute the extreme right-hand edge of the character 
O used for the classification. 

^ - 



